home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Original Shareware 1.1
/
The Original Shareware (WeMake CDs)(Volume 1.1)(CDs, Inc)(1993).iso
/
6
/
asm1_01.zip
/
ASSEMBLE.DOC
< prev
Wrap
Text File
|
1985-11-18
|
33KB
|
793 lines
TABLE OF CONTENTS
Description ..................2
Syntax .......................2
Labels .....................3
Opcodes ....................3
Operands ...................4
Pseudo ops .................5
Error Messages ................6
Source code ...................6
Disclaimer ....................7
Revisions......................8
Op code table .................9
1
ASSEMBLE.COM
DESCRIPTION
This is a two pass assembler written in Turbo Pascal. It
was written because I was a new owner of an IBM PCjr with little
software and was not aware of any public domain assemblers that
had reasonable performance. Additionally I had recently purchased
Turbo Pascal from Borland International and wanted a project to
help me learn Pascal.
This is a simple assembler for Intel 8088/8086 instruction
set. It closely follows the syntax of the instruction set
described in THE 8806 BOOK by Russel Rector and George Alexy. It
is also patterned after CHASM.BAS version 1.9 written in basic by
David Whitman of Whitman Software (there have been numerious
changes to avoid copywrite violations). It is not a macro
assembler and therefore recognizes only a few pseudo op codes.
Listed later in this documentation is a complete list of op codes
and pseudo ops recognized by ASSEMBLE.COM.
The input requirements to ASSEMBLE.COM are ordinary DOS
files which can be created with most text editors. The output
will be a listing sent to the screen, printer or a disk file and
a .COM file. This assembler was designed to create executable
programs and assembly code for use in BASIC or Turbo Pascal
programs. It will not generate code that can be linked to other
other programs. See your BASIC or TURBO instruction manuals for
including executable code in your programs. Since this assembler
is intended for small projects you will probably find bypassing
the Link and conversion from .EXE to .COM files is a convenience.
If you intend to write large software projects, I recommend you
get a macro assembler such as IBM's or CHASM version 4.0 (also
written in Turbo Pascal). Although intended for small projects,
since both the input file and output file(s) are on the disc the
only limitations to the size of your program is the disk space,
number of labels you use and your endurance. Labels and their
memory address are stored on the first pass in a data array. This
array is limited to 400 lables. My goal is to keep this program
under 35K bytes so it can be run on small systems. If you have
the memory this program can easily be recompiled with a larger
array defined. See the notice at the end for the source code.
SYNTAX
Each line of source code begins with a label or blank space
followed by the op code (8088 instruction or pseudo op) and then
the operand(s) (if required by the opcode). The source code may be
followed with a semicolon (;) and any comments for that line.
Optionally a line may be only comments if it starts with a semi
colon. A blank space must precede and follow the op code field.
For readability I recommend one or more spaces between the label
and the op code and between the op code and the operand. A comma
must be used to separate operands. The requirement for a comma
between operands differs from CHASM and should be noted when
converting programs.
2
Example
1stLabel mov ax,90H ;format example
LABLES
Although labels may be longer only the first 12 letters of
the label are stored for future reference therefore any labels
with the first twelve characters the same will cause a duplicate
label error. Also since the line parseing routine converts all
source code not in single quotes to upper case prior to decodeing
the line, you can not use upper and lower case to distinguish
labels. For example LongLabelxxx1 and longlabelxxx2 are both
stored as LONGLABLEXXX therefore would cause an duplicate label error.
If you are going to use numbers to distinguish labels I suggest
you use them at the beginning of a label such as 2LongLabelXXX.
OP CODES
All of the op codes specified in THE 8086 BOOK are supported
however in order to resolve some ambiguous op codes the syntax
was modified. The first ambiguous opcode is JMP which can be
either a 8 or 16 bit displacement jump. Eight bit displacement
jumps are resolved by specifying JMPS for short jump. Jumps
useing mem/reg (indirect) addressing for their destination must
specify Near or Far to indicate a jump within the current CS or
an intersegment jump. These jumps are coded as JMPN or JMPF. This
same logic is used for CALLN and CALLF when using the mem/reg
addressing mode. The other major area of ambiguity comes from
using op codes that do not specify a register as either the
destination or source. This assembler requires you to append the
op code with a B or W to distinguish between bytes and words ie.
MOVSB for move a string of bytes or MOVW [bx],8 to load 8 into
the word address pointed to by BX.
Normaly all data moves are assumed to be relative to the DS
(data segment) register. This default can be over ridden one
instruction at a time by using the SEG op code in the line prior
to the desired over ride.
Example
SEG ES
MOV AX,[BX]
This moves a word into the accumulator from the address in the
extra segment offset by the bx register. This is a little used
function since ASSEMBLE assumes all of the segment registers are
set to the same location as is required for the start of a .COM
program. Access of system resources should be done with BIO or
DOS calls when possible rather than going directly to a hardware
memory location outside your program.
The opcode table lists the available opcodes and pseudo ops
and the various addressing modes associated with each. Please
note that the mem/reg addressing mode includes several sub modes
such as base relative (using BX as an offset), stack relative
(using BP as an offset), and indexed (useing the SI or DI). See
the 8086 BOOK for an explanation of each mode.
3
OPERANDS
The operands describe to the assembler the destination and
source of the data to be operated on. The 8088 uses a number of
addressing modes to determine where that data is and should go.
You will discover by looking at the op code table, not all modes
can be used with an individual op code. Addressing modes are :
Accumulator - data is transferred to/from the accumulator.
Displacement - the displacement value is added to the present IP.
Immediate - data is assembled into the instruction.
Memory/Reg - data is transferred to/from address pointed to
by [mem] or [reg].
Register - data is transferred to/from the register.
Lables can be used in the Immediate, Displacement and Memory
addressing modes. This assembler follows the Intel convention of
treating a lable operand as a refference to the value in a
memory location unless it was defined by an EQU. If a value is
added to that lable it is then treated as a refference to the
location address. To use an offset to the lable to obtain a value
in a memory location the lable and offset must be contained in
brackets just as a numeric value would.
Operand Meaning
Lable refference value in the memory location
Offset Lable refference the memory location
Lable+5 refference the memory location at lable+5
This is the same as Offset Lable+5.
$-Lable refference to a memory location or a numeric
value i.e. when defineing a buffer length.
[Lable+5] refference the value in memory location lable+5
[1234] refference the value in memory location 1234
Addition, subtraction, multiplication and division are supported
by the parser. Lables are treated as numbers when used in a math
expression.
Accumulator: The accumulator(s) are AX or AL and AH where AX is a
16 bit accumulator, AL is the lower 8 bits of AX and AH is the
higher 8 bits.
DISPLACEMENT
A displacement value to be added to the instruction pointer
(IP) is included as immediate data in the opcode. The assembler
calculates the amount of displacement based on the location of
the opcode and then location of the address in the operand. The
address in the operand can be expressed as a number (binary,hex
or decimal) but is most commonly expressed as a label.
Example
LABEL MOV AX,[BX]
CMP AX,10H
JLE LABEL
With this example the assembler calculates a negative
displacement to jump back to LABEL when then value in AX is less
than or equal 10H.
4
IMMEDIATE
All immediate data is assembled into the instruction
code. This data can be represented in two ways. First immediate
data can be presented in binary, decimal or hexidecimal format in
a signed range of -32768 to 32767 (8000H to 7FFFH) or if the sign
bit is not used, 0 to 65535 (0000H to FFFFH). As in these
examples a 'H' is appended to the number to indicate hexidecimal.
Binary numbers are expressed as a series of up to 16 ones and
zeros followed by a B, i.e. 11010B represents 26. The other method
of representing immediate data is with labels. The value of the
label is the address at which the label was used or the value
assigned to the label in an EQU pseudo op.
Example
Lable equ 10
Here db 10
MOV BX,LABEL ;Load BX with the value of Label
MOV BX,OFFSET(Here) ;Load BX with the address of Here
MEMORY/REGISTER
This addressing mode is also called indirect addressing. The
operand is used to point to a memory location that contains the
data rather than the instruction containing the data as in the
immediate addressing mode. The operand can be a memory location
expressed as a label, decimal number or a hexidecimal number or
it can be a memory location pointed to by a register. The
following indirect modes are allowed:
MOV Reg,[BP]
MOV [BX],Reg
MOV [BX+SI],Reg ;BX plus SI displacement equal location
MOV Reg,[BX+DI] ;BX plus DI " " "
MOV Reg,[BP+SI] ;BP plus SI " " "
MOV [BP+DI],Reg ;BP plus SI " " "
MOV [DI],Reg
MOV Reg,[SI]
MOV LABEL,Reg
MOV [1234],Reg
Any of the general purpose registers can be used in place of Reg
in these examples. Immediate data may also be substituted for a
source register however then the opcode most be appended with W
or B so the assembler knows if you are pointing to a word or byte
address. In addition to the above when an indirect address using
a register is chosen a displacement may also be used.
Example
MOV Reg,10H[BP] ;Source address equal BP+10H
MOV Reg,-5[BP+SI] ;Source address equal BP+SI-5
DEMO EQU FFH
MOV DEMO[BX+DI],Reg ;Destination address equal BX+DI+255
REGISTER
In this addressing mode the data is contained in or is to be
stored in one of the 8088 registers. The registers are AX
(AL+AH), BX (BL+BH), CX (CL+CH), DX (DL+DH), BP, DI, SI and the
four segment registers CS, DS, ES, SS. All math operations use
the accumulator (AX, AL or AH) plus the MUL and DIV use the DX
register when 32 bit numbers are involved. The BX and BP
5
registers can be used as base pointers in the Data or Stack
segments respectively. The CX register can be used for a
automatic counter for some instructions. As demonstrated in
earlier examples the SI and DI registers can be used as indexing
registers. Any of the numerous assembly language books for the
IBM PC or PCjr will give you an explanation of each of the
processor registers and their uses.
PSEUDO OPCODES
Pseudo opcodes are assembler directives that control the
generation of the object code. The available pseudo ops are DB,
DS, DW, ENDP, EQU, ORG and PROC.
DB = define byte and has operands of one or more bytes and/or
a string ( DB 20H,'Demo' ). Strings are set off by single
quotes. Numbers are less than 256 and expressed in binary,
decimal or hexidecimal.
DS = define segment and initializes a string of memory
locations. The first operand defines the number of bytes to
be initialized. If included the second operand defines the
value the memory is to be initialized to. The default value
is zero. ( DS 20,FFH ;initialize 20 bytes to 255).
DW = define word and its operand(s) must be a number or a
label. With DW the low order byte is stored first in memory
as this is the format used by the 8088 for integer storage
(i.e. dw 1020H = db 20H,10H).
EQU= define the value of a label. All label definitions must
occur at the beginning of your program or errors may occur
in the assembly process. The most common error message
received from defining a label late in the program is 'PHASE
ERROR'. A phase error indicates the assembler generated a
address for a label on the second pass different from that
of the first pass.
ORG= reset the location counter to new origin. Since all .COM
programs start at 100H the default setting for ASSEMBLE.COM
is 100H however you may have a need to start at 00H for a
driver routine or a machine language routine for BASIC.
PROC and ENDP are used together to define a program or procedure
as Near or Far. This information is used to determine the
type of return to be generated when a RET is encountered.
If no procedure is defined a Near procedure is assumed. The
syntax is:
PROC NEAR ;Proc must be followed by Near or Far
....
....
ENDP
If PROC is used an ENDP must be used.
ERROR MESSAGES
All error messages and diagnostics are printed immediately
before the line in which the error occurs. The total number of
error and diagnostic messages will be displayed at the end of the
source code print out immediately prior to the symbol table dump.
I have made an attempt to make error messages as user
friendly as possible. The most cryptic of the error messages is
6
the series you receive when there is a syntax error. This message
will be the opcode and ASSEMBLE.COM's interpretation of the type
of data included in the operands. For example the message
*** Syntax Error: MOV (16 bit immediate or 8 bit immediate), (none)
would appear immediate before a line of code containing the
instruction MOV 45H. By reviewing the type of data and the
allowable operands for each instruction you should be able to
locate the error. In the instruction above both a destination and
source operand are required and if immediate data is used it must
be the source operand.
'Phase Error' is most commonly caused by referencing an
equate before defineing it. I strongly recommend you only use the
EQU pseudo op at the beginning of your source code. This practice
should prevent this error and will make your source code easier
to read.
'Error: EQU without symbol' is received when you use the equate
pseudo-op without a label.
'Error: EQU with forward reference' is received if you
attempt to use a forward reference when equating a label.
'Error: ENDP without PROC'. You must specify where the
procedure begins.
'Error: Missing ENDP'. You must specify where the procedure
ends if PROC is used.
'Error: Procedures nested too deeply'. Only 10 levels of
nesting are allowed.
'Error: Duplicate label'. See section on labels.
'Error: Data too long' indicates use of a byte operand where
the data is out of the range of 0 to 255.
'Error: Too far for short jump' indicates a jump attempt
longer than -128 or +127 bytes.
'Error: Undefined Symbol' plus the operand is displayed
when no match is found in the symbol table. Frequently caused by
bad syntax in the operand.
'Error: Illegal or undefined argument for OFFSET' is simular
to 'Undefined Symbol'
Two diagnostic messages may be given. The first follows a
syntax error and is 'Specify word or byte operand' if the
assembler could not determine which to use. The opcode must be
corrected by appending a B or W to it. The assembler is not smart
enough to give you this very often. The other message is just a
notice that you used a long jump where you could have used a
short jump and saved a byte of object code.
SOURCE CODE
Turbo Pascal source code for this assembler is avaliable for
those who wish to customize it for their own needs (or those who
would like to see what makes it tick). If you would like a copy
of the source code send a formatted disk and $10 to
George Fulford
RR 1 Box 163c
Shellsburg Ia 52332
Although I have no intentions of entering the software market at
this time I do plan to make corrections to this program as the
bugs are found. If you obtain a copy of the source code from me
7
it will be the most up to date version.
WARRENTY/GUARANTEE
There is NONE.
This assembler runs on my PCjr and since I used all standard
Turbo Pascal it should run on any PC DOS machine. If it does not
I probably won't be able to help you.
I have spent quite a bit of time debuging but I am sure there are
still a few bugs lurking in the code. I will attempt to stamp out
any reported.
UPDATES
Changes made by version 1.01
1) A more sophisticated number parsing routine was added. An
offset may now be added to a lable. Multiplication and
subtraction now occur in the correct order.
2) $ can now be used to indicate the present memory location.
3) Due to memory management problems with the PCjr lables are no
longer stored in a linked list that uses all of the avaliable
memory. Lables are stored in a predefined array of 400. This
should be sufficient for all small programs. The 'Stack Overflow
Error' is no longer applicable.
4) Lable storage has been expanded to twelve characters to
improve the readability of lables.
5) A larger input buffer has been allocated to reduce disk access
time and other minor changes in code to inprove performance.
6) Bugs in DIV, SHR, SHL, ORB instruction have been corrected.
DIVB and DIVW have been added to allow 8 or 16 bit divides useing
indirect addressing.
7) A bug in the line parsing routine that stopped the parsing at
a semicolon that was enclosed in quotes has been corrected.
8) An error in interpeting [BP] has been corrected.
8
OP CODE TABLE
addressing modes supported (b/w = must specify byte or word)
A = acumulator reg(ax, ah, al)
b/w = must add B or W to opcode for this addressing mode
D = displacememt (8 or 16 bit as required by the instruction)
I = immediate (byte for 8 bit registers, word for 16 bit reg)
M/R = memory or register indirect addressing
N = none
R = register(bx, cx, dx, bp, si, di)
S = segment register (cs, ds, es, ss)
Op Operand
types
dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
source N | I | M/R | I | N | R | M/R | I | I | N | N | N
AAA x | | | | | | | | | | |
AAD x | | | | | | | | | | |
AAM x | | | | | | | | | | |
AAS x | | | | | | | | | | |
ADC | x | | x | | x | x | b/w | | | |
AND | x | | x | | x | x | b/w | | | |
CALL | | | | | | | | x | | x |
CALLF | | | | | | | | | | | x
CALLN | | | | | | | | | | | x
CBW x | | | | | | | | | | |
CLC x | | | | | | | | | | |
CLD x | | | | | | | | | | |
CLI x | | | | | | | | | | |
CMC x | | | | | | | | | | |
CMP | x | | x | | x | x | b/w | | | |
CMPS b/w | | | | | | | | | | |
CWD x | | | | | | | | | | |
DAA x | | | | | | | | | | |
DAS x | | | | | | | | | | |
DB | | | | | | | | x | x | |
DEC | | | | x | | | | | | | b/w
DIV | | | | x | | | | | | | b/w
DS | | | | | | | | x | x | |
DW | | | | | | | | x | x | |
ENDP x | | | | | | | | | | |
EQU | | | | | | | | | x | | memory
HLT x | | | | | | | | | | |
IDIV | | x | | | | | | | | |
IMUL | | x | | | | | | | | |
IN | x |note1| | | | | | | | |
9
Op Operand
types
dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
source N | I | M/R | I | N | R | M/R | I | I | N | N | N
INC | | | | x | | | | | | | b/w
INT x | | | | | | | | | x | |
INTO x | | | | | | | | | | |
IRET x | | | | | | | | | | |
JA | | | | | | | | | | x |
JAE | | | | | | | | | | x |
JB | | | | | | | | | | x |
JBE | | | | | | | | | | x |
JCXZ | | | | | | | | | | x |
JE | | | | | | | | | | x |
JG | | | | | | | | | | x |
JGE | | | | | | | | | | x |
JL | | | | | | | | | | x |
JLE | | | | | | | | | | x |
JMP | | | | | | | | | | x |
JMPF | | | | | | | | | | x |
JMPN | | | | | | | | | | x |
JMPS | | | | | | | | | | x |
JNE | | | | | | | | | | x |
JNO | | | | | | | | | | x |
JNP | | | | | | | | | | x |
JNS | | | | | | | | | | x |
JNZ | | | | | | | | | | x |
JO | | | | | | | | | | x |
JP | | | | | | | | | | x |
JPE | | | | | | | | | | x |
JPO | | | | | | | | | | x |
JS | | | | | | | | | | x |
JZ | | | | | | | | | | x |
LAHF x | | | | | | | | | | |
LDS | | | | | |note2| | | | |
LEA | | | | | |note2| | | | |
LES | | | | | |note2| | | | |
LOCK x | | | | | | | | | | |
LODS b/w | | | | | | | | | | |
LOOP | | | | | | | | | | x |
LOOPE | | | | | | | | | | x |
LOOPNE | | | | | | | | | | x |
LOOPNZ | | | | | | | | | | x |
LOOPZ | | | | | | | | | | x |
10
Op Operand
types
dest. N | A | A | R | R | M/R | R | M/R | I | I | D | M/R
source N | I | M/R | I | N | R | M/R | I | I | N | N | N
MOV note3| | x | | | x | x | b/w | | | |
MOVS b/w | | | | | | | | | | |
MUL | | x | | | | | | | | |
NEG | | | | x | | | | | | |
NOP x | | | | | | | | | | |
NOT | | | | x | | | | | | | b/w
OR | x | | x | | x | x | b/w | | | |
ORG | | | | | | | | | x | |
OUT | |note1| | | | | | | | |
POP | | | | x | | | | | | |x or seg
POPF x | | | | | | | | | | |
PROC note4| | | | | | | | | | |
PUSH | | | | x | | | | | | |x or seg
PUSHF x | | | | | | | | | | |
RCL | | | | x | | | | | | | b/w
RCR | | | | x | | | | | | | b/w
REP x | | | | | | | | | | |
REPE x | | | | | | | | | | |
REPNE x | | | | | | | | | | |
REPNZ x | | | | | | | | | | |
REPZ x | | | | | | | | | | |
RET x | | | | | | | | | | x |
ROL | | | | x | | | | | | | b/w
ROR | | | | x | | | | | | | b/w
SAHF x | | | | | | | | | | |
SAR | | | | x | | | | | | | b/w
SBB | x | | x | | x | x | b/w | | | |
SCAS b/w | | | | | | | | | | |
SEG | | | | | | | | | x | |
SHL | | | | x | | | b/w | | | |
SHR | | | | x | | | b/w | | | |
STC x | | | | | | | | | | |
STD x | | | | | | | | | | |
STI x | | | | | | | | | | |
STOS b/w | | | | | | | | | | |
SUB | x | | x | | x | x | b/w | | | |
TEST | x | | x | | x | x | b/w | | | |
WAIT x | | | | | | | | | | |
XCHG | |note5| | | x | x | | | | |
XLAT x | | | | | | | | | | |
XOR | x | | x | | x | x | b/w | | | |
11
note 1 IN/OUT supports DX<-acum(8 or 16) and port<-acum(8 or 16).
note 2 These instructions can use only memory reference
in the source operand.
note 3 MOV also supports mem<-acum seg<-M/R and M/R<-seg(or CS).
note 4 Must specify near or far PROCedure.
note 5 The accumulator can be exchanged with any of the registers
using the form XCHG AX,BX or XCHG BX
12